Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Enhancements in Feature Detection and JSON Output Management #9

Merged
merged 3 commits into from
Feb 17, 2024

Conversation

jorgeaduran
Copy link
Contributor

@jorgeaduran jorgeaduran commented Feb 17, 2024

Major Enhancements in Feature Detection and JSON Output Management

Description

This PR introduces a series of comprehensive updates aimed at improving the efficiency, accuracy, and user control over feature detection and JSON output generation within our project. The changes span across various components, refining both the underlying logic for string extraction and the mechanisms for data representation. Below is a summary of the key enhancements:

Feature Detection Improvements

  • Optimized Unicode String Extraction: We've refined the extract_unicode_strings function to better handle UTF-16LE and UTF-16BE encodings, employing targeted regex patterns that enhance the accuracy of our string detection efforts.
  • Advanced Bytes Feature Evaluation: The evaluation method in BytesFeature now utilizes a sliding window approach, allowing us to detect specified byte sequences more flexibly across different contexts.

JSON Output Management

  • Enhanced JSON Generation for Map Features: With the new -f parameter, users can now filter map features by type, making the JSON output more relevant and manageable. This feature is triggered by the -m flag and requires specifying an output path using the -o parameter.
  • Clean String Function: We've added a function to sanitize extracted strings, ensuring the output is free from null characters and non-printable ASCII characters.

Safety and Usability Enhancements

  • Boundary Checks and Error Handling: Significant updates have been made to prevent buffer over-reads and integer overflows, particularly in the detect_ascii_len function, enhancing the overall safety of our operations.
  • CLI Options Expansion: The introduction of filter_map_features in CliOpts allows for even finer control over the features to be processed.

Why This Matters

These updates collectively represent a significant leap forward in our project's capability to accurately detect and represent data features, catering to a broader range of encoding scenarios and user needs. By improving efficiency, accuracy, and control, we are setting a solid foundation for future developments and applications of our project.

Testing

  • Test cases cover a variety of scenarios, including different encoding formats, feature types, and JSON output configurations.

I look forward to your feedback and any further suggestions for improvement!

- Optimized extract_unicode_strings to improve efficiency in UTF-16LE and UTF-16BE string extraction, using refined regex patterns for more accurate detection.
- Introduced filter_map_features option in CliOpts for enhanced feature filtering capabilities.
- Added BufferOverFlowError to handle buffer overflow conditions more effectively.
- Implemented get_name method for Feature, facilitating JSON dumping of features.
- Refined JSON generation for map_features with -f parameter for feature type filtering and -m flag for triggering JSON dump, requiring -o parameter for output path specification.
- Streamlined read_string function for more efficient ASCII and Unicode string extraction, utilizing read_bytes output and ensuring exact chunking for UTF-16 conversion.
- Enhanced detect_ascii_len function with boundary checks and checked arithmetic, preventing over-reads and integer overflow, thus ensuring accurate ASCII length detection.
- Modified BytesFeature's evaluate method to use a sliding window for pattern detection, enhancing feature detection across various contexts.
- Implemented logic to filter out empty features before insertion, ensuring meaningful and relevant data processing.
- Noted that JSON generation now occurs only when -o parameter is specified, with -f parameter available for filtering, emphasizing purposeful and customizable data output.

These cumulative updates significantly improve the project's robustness, accuracy, and user control over feature detection and data representation.
@jorgeaduran
Copy link
Contributor Author

PD: For it to compile it is necessary to accept the smda PR

@marirs marirs merged commit b6bb90c into marirs:master Feb 17, 2024
4 checks passed
@marirs
Copy link
Owner

marirs commented Feb 17, 2024

Hey thanks again..

1 Quick Question

This which I refactored:

if let Some(Yaml::String(s)) = rule.meta.get(&Yaml::String("namespace".to_string())) {
                self.capability_namespaces.insert(rule.name.clone(), s.clone());
                let first_non_zero_address = caps
                    .iter()
                    .find(|&&(addr, _)| addr != 0)
                    .map(|&(addr, _)| addr)
                    .unwrap_or(0);

                let _ = self
                    .capabilities_associations
                    .entry(rule.name.clone())
                    .or_insert_with(|| CapabilityAssociation {
                        attack: local_attacks_set.clone(),
                        mbc: local_mbc_set.clone(),
                        namespace: s.clone(),
                        name: rule.name.clone(),
                        address: first_non_zero_address as usize,
                    });
            }

and this (which you reverted):

if let Some(namespace) = rule.meta.get(&Yaml::String("namespace".to_string())) {
                if let Yaml::String(s) = namespace {
                    self.capability_namespaces
                        .insert(rule.name.clone(), s.clone());
                    let first_non_zero_address = caps
                        .iter()
                        .find(|&&(addr, _)| addr != 0)
                        .map(|&(addr, _)| addr)
                        .unwrap_or(0);

                    let _ = self
                        .capabilities_associations
                        .entry(rule.name.clone())
                        .or_insert_with(|| CapabilityAssociation {
                            attack: local_attacks_set.clone(),
                            mbc: local_mbc_set.clone(),
                            namespace: s.clone(),
                            name: rule.name.clone(),
                            address: first_non_zero_address as usize,
                        });
                }
            }

are the same. Any reason why you reverted back?

@jorgeaduran
Copy link
Contributor Author

Thank you for highlighting this. After a closer look, it seems the modifications indeed revolve around formatting rather than functional changes. It's possible that I worked on a version of the code that didn't include your recent changes, leading to the unintended omission of your refinements. I apologize for any confusion this may have caused and will make sure to synchronize changes more carefully in the future to avoid such oversights.

@marirs
Copy link
Owner

marirs commented Feb 17, 2024

Nah - dont worry about that.. I'll make the update now!
Just thought maybe it broke something.
Thanks again so much for your awesome pushes :)

@marirs
Copy link
Owner

marirs commented Feb 17, 2024

Done I've pushed this as well :)

Thanks

@jorgeaduran
Copy link
Contributor Author

Thank you for your understanding and for handling the update. I'm glad it didn't cause any issues. I appreciate the collaboration and look forward to our continued work together. Thanks again! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants